A Semi-supervised Heat Kernel Pagerank Mbo Algorithm for Data Classification

نویسندگان

  • EKATERINA MERKURJEV
  • ANDREA L. BERTOZZI
  • FAN CHUNG
چکیده

We present a very efficient semi-supervised graph-based algorithm for classification of high-dimensional data that is motivated by the MBO method of Garcia-Cardona (2014) and derived using the similarity graph. Our procedure is an elegant combination of heat kernel pagerank and the MBO method applied to study semi-supervised problems. The timing of our algorithm is highly dependent on how quickly the pagerank can be computed; we use two different yet very efficient approaches to calculate the pagerank, one of which proceeds by simulating random walks of bounded length. Overall, our method is advantageous for very big, sparse data, in which the graph has few edges, and it produces good accuracy even if the number of labeled instances is very small. In fact, the accuracy of the procedure is comparable with or better than that of state-of-the-art methods and is demonstrated on benchmark data sets. In addition to experimental results, we include a thorough comparison of our algorithm to that of Garcia-Cardona (2014) and describe the advantages of both methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast MBO Scheme for Multiclass Data Classification

We describe a new variant of the MBO scheme for solving the semi-supervised data classification problem on a weighted graph. The scheme is based on the minimization of the graph heat content energy. The resulting algorithms guarantee dissipation of the graph heat content energy for an extremely wide class of weight matrices. As a result our method is both flexible and unconditionally stable. Ex...

متن کامل

Graph-based semi-supervised learning methods and quick detection of central nodes. (Méthodes d'apprentissage semi-supervisé basé sur les graphes et détection rapide des nœuds centraux)

Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with unlabelled data to tune the classifier. The main idea of the semi-supervised methods is based on an assumption that the classification function should change smoothly over a similarity graph, which represents relations among data points. This idea can be expressed using ker...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

High-quality Training Data Selection using Latent Topics for Graph-based Semi-supervised Learning

In a multi-class document categorization using graph-based semi-supervised learning (GBSSL), it is essential to construct a proper graph expressing the relation among nodes and to use a reasonable categorization algorithm. Furthermore, it is also important to provide high-quality correct data as training data. In this context, we propose a method to construct a similarity graph by employing bot...

متن کامل

Accurate Semi-supervised Classification for Graph Data

Most machine learning algorithms require labeled instances as training data; however, not all instances are equally easy to obtain labels for. For example, the best-known papers and/or websites would be easier for a domain expert to label. We propose a new PageRank-style method for performing semi-supervised learning by propagating labels from labeled seed instances to unlabeled instances in a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016